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Abstract 

A recent application of the Kolmogorov-Smirnov test to the WMAP 7 year W-band maps claims evidence that the 
CMB is "weakly random", and that only 20% of the signal can be explained as a random Gaussian field. I here repeat 
this analysis, and in contrast to the original result find no evidence for deviation from the standard ACDM model. 
Instead, the results of the original analysis are consistent with not properly taking into account the correlations of the 
ACDM power spectrum. 
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1. Introduction 

In astronomical data analysis, it is often useful to be able 
to test whether a set of data points follows a given dis- 
tribution or not. For example, many analysis techniques 
depend on instrument noise being Gaussian, and to avoid 
bias, one must check that this actually is the case. There are 
many different ways in which two distributions can differ, 
and correspondingly many different ways to test them for 
equality. The simplest ones, such as comparing the means 
or variances of the distributions, suffer from the problem 
that there are many ways in which distributions can differ 
that they cannot detect no matter how many samples are 
available. For example, samples from a uniform distribution 
can easily pass as Gaussian if one only considers the mean 
and variance. 

The popular Kolmogorov-Smirnov test (K-S test) re- 
solves this problem by considering the cumulative distribu- 
tion functions (CDF) instead: Construct the empirical CDF 
of the data points and find its maximum absolute difference 
K from the theoretical CDF. Due to the limited number of 
samples, the empirical CDF will be noisy, and K will there- 
fore be a random variable with its own CDF, which in the 
limit where the number of samples goes to infinity is given 

by 



with 



P{x <K) = Fks{Vn~;^sK) 



(1) 



(2) 



In contrast with the simplest tests, this test can detect any 
deviation in the distributions, but may require a large num- 
ber of samples to do so, especially in the tails of the distri- 
bution. 

Recently, a series of papers ( Gurzadyan et al. |2Q11 
Gurzadyan fc Kocharyan| 2008 Gurzadyan et al. |2010[ 



has applied this test to WMAP's cosmic microwave back- 
ground (CMB) maps, resulting in the remarkable claim 
that the CMB is "weakly random", with only 20% of the 
CMB signal behaving as one would expect from a ran- 
dom Gaussian field. This result went on to be used in 
a much discus sed series of papers ( [Gurzadyan fc Penrose 
2010a|b 2011) claiming a strong detection of concentric 
low-variance circles in the CMB, which was taken as evi- 



dence for Conformal Cyclic Cosmolog y. Other groups failed 
to signif icantly detect the circle s (jWehus & Eriksen||2010 
""~l] |Ha.iian||2010D. Tl 

di: 



Moss et al. 2011 



icance was due 



Hajian||2010[ ). The difference in signif 
iferent CMB models: IWehus fc EriksenI 



(2010); Moss et al. (2011); Hajian (2010 



used realizations 



of the best-fit A CDM power spectrum, while [Gurzadyan"S^ 



Penrose (2010b) used a "weakly random" CMB model. 

Both in order to resolve this issue, and because a weakly 
random universe would be a strong blow against the ACDM 
model in its own right, it is important to test this result. 

2. Method 

Before applying the K-S test, one must be aware of its lim- 
ited area of validity: Equation ([T]) requires an infinite num- 
ber of independently identically distributed samples, while 
CMB maps actually consist of a limited number of corre- 
lated samples. However, both the correlations and number 
of samples can be compensated for, as we shall see. 

2.1. Application of the K-S test to correlated data 

Though the K-S test is not immediately applicable to a 
correlated data set, it is possible to perform an equivalent 
test on a transformed set of samples. The question we are 
trying to answer with the K-S test is "Do the samples follow 
the theoretical distribution?". The truth or falseness of this 
is preserved if we apply the same transformation to both 
the samples and the distribution we test them against, and 
to be able to use the K-S test, the logical transformation 
to use is a whitening transformation, which results in an 
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Figure 1. ACDM two point correlation function after 
applying the WMA P W-band beam and the HEALPix 
( Gorski et al.||2QQ5 ) nside 512 pixel window. 



independent, identical distribution for the samples. With 
original samples d with covariance matrix C, the whitened 
(uncorrelated with unit variance) samples r are given by: 



r = C-2d 



(3) 



Thus, to test whether the data points d ^ N(0, C), we can 
test the equivalent hypothesis r ^ N(0, 1). 

In the case of CMB maps, both the data itself and the 
noise is expected to be Gaussian, so the obvious theoretical 
distribution here is A/'(0, S + N), where the CMB signal 
covariance matrix S is given by the two-point correlation 
function: 



Si- 



2/ + 1 
47r 



aAP,(cos(|pi-pji)) 



(4) 



Pi{x) are the Legendre polynomials normalized to and 
Pi and pj are the direction vectors for pixel i and j in the 
disk. Ci is the ACDM angular power spectrum, while Bi 
accounts for the beam and pixel window. N is instrument 
dependent, but for the WMAP W-band CMB map we will 
use here, the noise is nearly diagonal, and given by the 
corresponding W-band RMS map. 



2.2. Application of the K-S test with few samples 

The other problem we need to account for is our finite 
number of samples. In this case equation ([T]) is only ap- 
proximate. For most uses of the test, this approximation is 
good enough, especially when employing analytical expres- 
sions for improving t he quality of the a pproximation for low 
numbers of samples fvon Mises'"1964'). For example, when 
performing a single test to accept or reject a test distribu- 
tion, a bias of a few percent in the confidence with which 
the hypothesis is rejected is not important. 

However, when making statistics for a large number of 
such test results, such a bias may make the results ambigu- 
ous. Given a set of experiments with a corresponding set of 
maximum deviations {i^i}, the corresponding probabilities 
{pi = P{x < Ki)} should be uniformly distributed if the 
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Figure 2. When applying the K-S test to samples known 
to come from the correct distribution, the resulting val- 
ues {pi = P{x < Ki)} should be uniformly distributed, 
but when working with a limited number of samples, 
the Kolmogorov distribution is only approximate, and the 
actual CDF of the results, G{p), differs from the ideal 
Goo{p) = P- This is shown in the upper panel for the case 
of 540 samples per experiment, where G{p) is the solid line 
and Goo(p) is dashed. The lower panel shows the deviation 
between the two, which is of the order of 1% in this case 
(but larger with fewer samples). 



samples actually follow the theoretical distribution, and a 
histogram of {pi} should therefore be flat. Deviations from 
this indicate that the theoretical distribution does not ac- 
curately describe the samples. However, the approximate 
equation ([T]) also introduces a small non-uniformity in {pi} 
even if the samples actually do follow the distribution. To 
avoid the ambiguity this causes, we will instead compute a 
numerical correction function mapping the approximate p 
to the true p\ 

To build up the correction, we simulate a large numbeij^ 
of experiments, each with the same number of samples as 
the actual data set, but drawn directly from the theoretical 
distribution. Thus, for these, {pi} should be uniform, with a 
CDF of Goo(p) = P' However, since equation (IT]) is inexact, 
for small numbers of samples, the actual CDF is G{p) ^ 
Goo (p) • The mapping between the approximate p and true 
p' is given by G{p) = Goo(p') ^ p' = G^'{G{p)) = G{p). 



Thus, for a limited number of samples 



P{x <K) = G{FKs{VN,i,,K)) 



(5) 



Figure [2] illustrates the correction function for 5 • 10^ simu- 
lations of 540 each. For this many samples, the correction 
is only of the order of 1%. 



^ What we do here is essentially replacing the analytical 
Komolgorov distribution (equation ([l])) with a numerical dis- 
tribution. This could also be done without using the analytical 
distribution as a basis, at a small cost in clarity. 

^ The number necessary depends on the level of accuracy 
desired. The noise in the estimate of G{p) propagates to the 
final results. To make this a subdominant noise contribution, 
the number of simulations should be at least as large as the 
number of actual experiments, preferably much higher. 
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Figure 3. A randomly selected disk before (left) and after 
(right) the whitening operation. The samples are strongly 
correlated and thus unsuitable for the K-S test before the 
transformation, but afterwards no correlations are visi- 
ble and the variance is 1. Note that whitening the data 
does not mean that we are "forcing" the K-S test to pass. 
The whitened data will only end up matching A^(0, 1) af- 
ter whitening if they followed our theoretical distribution 
^-(0, C) before. 



3. Does ACDM fail the K-S test? 

With this in ha nd we can fina lly apply the K-S test on CMB 
data. Following Gurzadyan et al. (2011j), we randomly pick 
10 000 disks with a radius of 1.5 de grees from the WMAP 



7 year W-band map (Jarosik et al. 2011), with the region 
within 30 degrees from the galactic equator excluded. Each 
disk contains on average 540 pixels, which are whitened us- 
ing equation ([3|. A typical disk before and after the whiten- 
ing operation can be seen in Fig. [3] After whitening, the 
values should follow the distribution A^(0, 1) if our model 
is correct. 

The histogram of resulting probabilities {pi = P{x < 
Ki)} from of applying equation ^ to the hypothesis r ^ 
N(0, 1) is shown in Fig. [4j together with the 68% and 
95% intervals from 300 simulations. The data and simu- 
lations are consistent, and follow a uniform distribution as 
expecte cQ The CMB map is fully consistent with ACDM 
+ WMAP noise as far as the K-S test is concerned. 

This is dramatically different from the curve found by 
Gurzadyan et al. (2011 ), which was strongly biased towards 
low values. Low values of P{x < K) would mean that the 
empirical CDF of the samples matches the theoretical one 
too well^ i.e. even better than samples drawn directly from 
the theoretical distribution. 

What could cause [Gurzadyan et al.| to get results so dif- 
ferent from ours? One way biasing P{x < K) low is by bas- 
ing the parameters of your test distribution on the values 
themselves. However, even without doing this, it is possible 
to get low values if the values used in the K-S test are cor- 
related. Th is is also cons istent with the presentation given 
by jCurzadyan et al. ( 2011) who apparently applied the K-S 
test directly to the raw samples d, or equivalent ly, that they 
model the pixel values as coming from a 1-dimensional dis- 



^ It should be noted that the histogram bins are not com- 
pletely independent for two reasons: Firstly, some disks are go- 
ing to overlap, meaning that the same samples enter into several 
different K-S tests, and secondly, while our transformation has 
made the samples within each disk independent, the correlation 
between different disks is still present. 
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Figure 4. Histogram of results of the K-S test. Each panel 
compares the results from properly taking the correlations 
into account (solid line) with those one gets from ignor- 
ing them (dashed line), together with 68% and 95% inter- 
vals (dotted lines) from simulations. The upper panel corre- 
sponds to using samples further than 30 degrees away from 
the galactic equator, while the lower panel instead uses the 
WMAP KQ85 analysis mask. In both cases, both the map 
and the simulations pass the K-S test when taking the cor- 
relations into account , while if the a re ignored, the K-S test 
fails in the same way [Gurzadyan et al.| ([2011| reported. 



tribution. To check this, I repeated the analysis, this time 
using the theoretical distribution d ^ N{/j.^a'^), where fi 
and (T^ are the measured mean and variance of the samples 
in the disk. The result is also shown in Fig. |4] This time, 
the bias towards low values is clearly recreated. 

It therefore seems likely that [Gurzadyan et aL] s re- 
ported "weak randomness" is the result of not properly tak- 
ing the CMB's correlations into account. One is, of course, 
free to use whatever distribution one wants as the theo- 
retical distribution in a K-S test, even a model where the 
CMB pixels are independently identically distributed, with 
no correlations at all. The problem lies in the interpreta- 



tion of the test results. For [Gurzadyan et al. (2011 ), the K-S 
test results are clearly not uniform, indicating that the cho- 
sen theoretical di stribution has been disproven. However, 
[Gurzadyan et al. 'then go on to create a set of simulations 
(linear combinations of 20% Gaussian and 80% static sig- 
nal) that fail the test in the same way as the WMAP map 
does. But having two sets of samples fail the K-S test the 
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same way does not prove that they have the same proper- 
ties. It simply means that the chosen test distribution was 
a poor choice. 



4. Kolmogorov maps 



While [Gurzadyan et aL\ s Kolmogorov statistics are bi- 
ased by not taking the correlations into account, the ap- 
proach of making sky m aps of K-S test results introduced in 
Gurzadyan et al. (2009) is still an interesting way to search 
for regions of the sky that do not follow the expected dis- 
tribution. Making an unbiased Kolmogorov map straight- 
forwardly follows the procedure in Sect. [3j with the main 
difference being the selection of pixels. Instead of randomly 
selecting disks, we now systematically go through nside 16 
pixels, using the 1024 nside 512 subpixels inside each one 
as the samples. These are then tested against A/'(0, C) by 
whitening them via equation ^ and then comparing the 
whitened samples to A/'(0, 1). 

The result is the nside 16 map of P{x < K) shown in 
Fig. [5] Regions that pass the test have a value uniformly 
distributed between and 1, and we see that this applies 
to the CMB-dominated areas of the sky, while areas domi- 
nated by the galaxy fail the test as expected. 

For comparison. Fig. [5] also includes the result of mak- 
ing the same map while ignoring correlations. In this case, 
the whole sky fails the test: The CMB-dominated areas are 
biased low, while the gal axy is biased high. T his map is sim- 
ilar to the map in Gurzadyan et al. (2009), which is also 
too low outside the galaxy, and too high inside, which is, 
again, consistent with "Gurzady an et al.| applying the K-S 
test directly to the raw samples. 



5. Summary 

The Kolmogorov-Smirnov test is a useful and general way 
of testing whether a data set follows a given distribution or 
not. However, it only applies to independently identically 
distributed samples. The CMB is strongly correlated, and 
thus not immediately compatible with the test. However, 
this can be resolved by the application of a whitening trans- 
formation, replacing the hypothesis d ^ N(0, C) with the 
equivalent C~2d ^ N(0,1). With this, we find that the 
ACDM passes the K-S test. This is incomp atible with the 
original analysis by Gurzadyan et al. (2011 ), which claimed 
detection of an unknown non-random component making 
up 80% of the CMB based on the CMB failing the K-S test 
there. It turns out that this analysis did not take the CMB 
correlations into account, which we confirm by producing 
the same failure of the K-S test when we skip the whitening 
step. When the correlations are handled properly, there is 
no need for a weakly random universe. 




Figure 5. nside 16 map of P{x < K) based on pixels from 
the WMAP 7 year nside 512 W-band map. This time, the 
K-S test is performed on the 1024 nside 512 pixels inside 
each nside 16 pixel instead of on disks. The upper panel 
uses A/'(0, S + N) as the theoretical distribution (by testing 
the whitened data against A^(0, 1)), while the lower panel 
ignores the correlations, instead using 7V(/i,cr^), where ji 
and cr are the measured mean and standard deviation of 
the samples. The former passes the test outside the galactic 
plane, while the latter fails everywhere, being biased low 
outside the galaxy. 
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